51 research outputs found

    Evaluation and Acceleration of High-Throughput Fixed-Point Object Detection on FPGAs

    Full text link

    RECONFIGURABLE COMPUTING Introduction

    No full text

    Compiler Generated Systolic Arrays For Wavefront Algorithm Acceleration on FPGAs

    No full text
    Wavefront algorithms, such as the Smith-Waterman algorithm, are commonly used in bioinformatics for exact local and global sequence alignment. These algorithms are highly computationally intensive and are therefore excellent candidates for FPGA-based code acceleration. However, there is no standard form of these algorithms, they are used in a wide variety of situations with various constraints. It is therefore not practical to have a standard kernel that can be mapped to an FPGA, hence the importance of being able to compile such codes from a high level language. ROCCC is a C to VHDL compiler, which optimizes and parallelizes the most frequently executed kernel loops in applications such as in multimedia, scientific and high-performance computing. In this paper we describe the transformations performed by ROCCC, which transformed the kernel of the Smith-Waterman algorithm into a hardware systolic array that is mapped onto the FPGA on the SGI Altix RASC blade. We report a throughput increase by over 3,000X over a 2.8 GHz Xeon. 1

    Fast Area Estimation to support Compiler Optimizations in FPGA-based Reconfigurable Systems

    No full text
    Several projects have developed compiler tools that translate high-level languages down to hardware description languages for mapping onto FPGAbased reconfigurable computers. These compiler tools can apply extensive transformations that exploit the parallelism inherent in the computations. However, the transformations can have a major impact on the chip area (number of logic blocks) used on the FPGA. It is imperative therefore that the compiler user be provided with feedback indicating how much space is being used. In this paper we present a fast compile-time area estimation technique to guide the compiler optimizations. Experimental results show that our technique achieves an accuracy within 2.5 % for small image-processing operators, and within 5.0% for larger benchmarks, as compared to the usual post-compilation synthesis tool estimations. The estimation time is in the order of milliseconds as compared to several minutes for a synthesis tool. 1

    Experimental Evaluation of Blocking and Non-Blocking Multithreaded Code Execution

    No full text
    The objective of multithreaded execution models is masking the latency of inter processor communications and remote memory accesses in large-scale multiprocessors. Several such models combine aspects of dataflow-like execution with the von Neumann model in an attempt to provide both efficient synchronization (as in the dataflow model) and efficient exploitation of program locality (as in the von Neumann model). We refer to these models as data-driven multithreading models. One of the factors that distinguishes these models is the thread execution strategy: A thread can be either non-blocking or blocking. Another factor is the architectural support for dynamic synchronization: The locality present within and among threads can potentially be exploited by a proper storage hierarchy for synchronization store (operand storage). Two storage models have been proposed for data-driven multithreaded execution. One is frame based, in which all the threads belonging to a code-block share one stora..

    Reconfigurable Computing

    No full text
    • …
    corecore